Technotools

home *** CD-ROM | disk | FTP | other *** search

/ Technotools / Technotools (Chestnut CD-ROM)(1993).ISO / unix / minix / minix.txt < prev next >

Wrap

Text File | 1979-12-31 | 18.4 KB | 462 lines

MINIX: A CHEAP UNIX CLONE WITH SOURCE CODE Andrew S. Tanenbaum Dept. of Mathematics and Computer Science Vrije Universiteit Amsterdam, The Netherlands 1. OVERVIEW OF THE MINIX SYSTEM ARCHITECTURE UNIX- is organized as a single executable program that is loaded into memory at system boot time and then run. MINIX is structured in a much more modular way, as a collec- tion of processes that communicate with each other and with user processes by sending and receiving messages. There are separate processes for the memory manager, the file system, for each device driver, and for certain other system func- tions. This structure enforces a better interface between the pieces. The file system cannot, for example, acciden- tally change the memory manager's tables because the file system and memory manager each have their own private address spaces. These system processes are each full-fledged processes, with their own memory allocation, process table entry and state. They can be run, blocked, and send messages, just as the user processes. In fact, the memory manager and file system each run in user space as ordinary processes. The device drivers are all linked together with the kernel into the same binary program, but they communicate with each other and with the other processes by message passing. When the system is compiled, four binary programs are independently created: the kernel (including the driver processes), the memory manager, the file system, and init (which reads /etc/ttys and forks off the login processes). In other words, compiling the system results in four dis- tinct a.out files. When the system is booted, all four of these are read into memory from the boot diskette. It is possible, and in fact, normal, to modify, recom- pile, and relink, say, the file system, without having to relink the other three pieces. This design provides a high degree of modularity by dividing the system up into indepen- dent pieces, each with a well-defined function and interface to the other pieces. The pieces communicate by sending and receiving messages. The various processes are structured in four layers: 4. The user processes (top layer). 3. The server processes (memory manager and file system). _________________________ - UNIX is a trademark of AT&T Bell Laboratories. 2. The device drivers, one process per device. 1. Process and message handling (bottom layer). Let us now briefly summarize the function of each layer. Layer 1 is concerned with doing process management including CPU scheduling and interprocess communication. When a process does a SEND or RECEIVE, it traps to the ker- nel, which then tries to execute the command. If the com- mand cannot be executed (e.g., a process does a RECEIVE and there are no messages waiting for it), the caller is blocked until the command can be executed, at which time the process is reactivated. When an interrupt occurs, layer 1 converts it into a message to the appropriate device driver, which will normally be blocked waiting for it. The decision about which process to run when is also made in layer 1. A prior- ity algorithm is used, giving device drivers higher priority over ordinary user processes, for example. Layer 2 contains the device drivers, one process per major device. These processes are part of the kernel's address space because they must run in kernel mode to access I/O device registers and execute I/O instructions. Although the IBM PC does not have user mode/kernel mode, most other machines do, so this decision has been made with an eye toward the future. To distinguish the processes within the kernel from those in user space, the kernel processes are called tasks. Layer 3 contains only two processes, the memory manager and the file system. They are both structured as servers, with the user processes as clients. When a user process (i.e., a client) wants to execute a system call, it calls, for example, the library procedure read with the file descriptor, buffer, and count. The library procedure builds a message containing the system call number and the parame- ters and sends it to the file system. The client then blocks waiting for a reply. When the file system receives the message, it carries it out and sends back a reply con- taining the number of bytes read or the error code. The library procedure gets the reply and returns the result to the caller in the usual way. The user is completely unaware of what is going on here, making it easy to replace the local file system with a remote one. Layer 4 contains the user programs. When the system comes up, init forks off login processes, which then wait for input. On a successful login, the shell is executed. Processes can fork, resulting in a tree of processes, with init at the root. When CTRL-D is typed to a shell, it exits, and init replaces the shell with another login pro- cess. 2. LAYER 1 - PROCESSES AND MESSAGES The two basic concepts on which MINIX is built are processes and messages. A process is an independently schedulable entity with its own process table entry. A mes- sage is a structure containing the sender's process number, - 2 - a message type field, and a variable part (a union) contain- ing the parameters or reply codes of the message. Message size is fixed, depending on how big the union happens to be on the machine in question. On the IBM PC it is 24 bytes. Three kernel calls are provided: - RECEIVE(source, &message) - SEND(destination, &message) - SENDREC(process, &message) These are the only true system calls (i.e., traps to the kernel). RECEIVE announces the willingness of the caller to accept a message from a specified process, or ANY, if the RECEIVER will accept any message. (From here on, ``pro- cess'' also includes the tasks.) If no message is available, the receiving process is blocked. SEND attempts to transmit a message to the destination process. If the destination process is currently blocked trying to receive from the sender, the kernel copies the message from the sender's buffer to the receiver's buffer, and then marks them both as runnable. If the receiver is not waiting for a message from the sender, the sender is blocked. The SENDREC primitive combines the functions of the other two. It sends a message to the indicated process, and then blocks until a reply has been received. The reply overwrites the original message. User processes use SENDREC to execute system calls by sending messages to the servers and then blocking until the reply arrives. There are two ways to enter the kernel. One way is by the trap resulting from a process' attempt to send or receive a message. The other way is by an interrupt. When an interrupt occurs, the registers and machine state of the currently running process are saved in its process table entry. Then a general interrupt handler is called with the interrupt number as parameter. This procedure builds a mes- sage of type INTERRUPT, copies it to the buffer of the wait- ing task, marks that task as runnable (unblocked), and then calls the scheduler to see who to run next. The scheduler maintains three queues, corresponding to layers 2, 3, and 4, respectively. The driver queue has the highest priority, the server queue has middle priority, and the user queue has lowest priority. The scheduling algo- rithm is simple: find the highest priority queue that has at least one process on it, and run the first process on that queue. In this way, a clock interrupt will cause a process switch if the file system was running, but not if the disk driver was running. If the disk driver was running, the clock task will be put at the end of the highest priority queue, and run when its turn comes. In addition to this rule, once every 100 msec, the clock task checks to see if the current process is a user process that has been running for at least 100 msec. If so, that user is removed from the front of the user queue and put on the back. In effect, compute bound user processes - 3 - are run using a round robin scheduler. Once started, a user process runs until either it blocks trying to send or receive a message, or it has had 100 msec of CPU time. This algorithm is simple, fair, and easy to implement. 3. LAYER 2 - DEVICE DRIVERS Like all versions of UNIX for the IBM PC, MINIX does not use the ROM BIOS for input or output because the BIOS does not support interrupts. Suppose a user forks off a compilation in the background and then calls the editor. If the editor tried to read from the terminal using the BIOS, the compilation (and any other background jobs such as printing) would be stopped dead in their tracks waiting for the the next character to be typed. Such behavior may be acceptable in the MS-DOS world, but it certainly is not in the UNIX world. As a result, MINIX contains a complete set of drivers that duplicate the functions of the BIOS. Like the rest of MINIX, these drivers are written in C, not assembly language. This design has important implications for running MINIX on PC clones. A clone whose hardware is not compati- ble with the PC down to the chip level, but which tries to hide the differences by making the BIOS calls functionally identical to IBM's will not run an unmodified MINIX because MINIX does not use the BIOS. Each device driver is a separate process in MINIX. At present, the drivers include the clock driver, terminal driver, various disk drivers (e.g., RAM disk, floppy disk), and printer driver. Each driver has a main loop consisting of three actions: 1. Wait for an incoming message. 2. Perform the request contained in the message. 3. Send a reply message. Request messages have a standard format, containing the opcode (e.g., READ, WRITE, or IOCTL), the minor device number, the position (e.g., disk block number), the buffer address, the byte count, and the number of the process on whose behalf the work is being done. As an example of where device drivers fit in, consider what happens when a user wants to read from a file. The user sends a message to the file system. If the file system has the needed data in its buffer cache, they are copied back to the user. Otherwise, the file system sends a mes- sage to the disk task requesting that the block be read into a buffer within the file system's address space (in its cache). Users may not send messages to the tasks directly. Only the servers may do this. MINIX supports a RAM disk. In fact, the RAM disk is always used to hold the root device. When the system is booted, after the operating system has been loaded, the user is instructed to insert the root file system diskette. The file system then sees how big it is, allocates the necessary - 4 - memory, and copies the diskette to the RAM disk. Other file systems can then be mounted on the root device. This organization puts important directories such as /bin and /tmp on the fastest device, and also makes it easy to work with either floppy disks or hard disks or a mixture of the two by mounting them on /usr or /user or elsewhere. In any event, the root device is always in the same place. In the standard distribution, the RAM disk is about 240K, most of which is full of parts of the C compiler. In the 256K system, a much smaller RAM disk has to be used, which explains why this version has no C compiler: there is no place to put it. (The /usr diskette is completely full with the other utility programs and one of the design goals was to make the system run on a 256K PC with 1 floppy disk.) Users with an unusual configuration such as 256K and three hard disks are free to juggle things around as they see fit. The terminal driver is compatible with the standard V7 terminal driver. It supports cooked mode, raw mode, and cbreak mode. It also supports several escape sequences, such as cursor positioning and reverse scrolling because the screen editor needs them. The printer driver copies its input to the printer character for character without modification. It does not even convert line feed to carriage return + line feed. This makes it possible to send escape sequences to graphics printers without the driver messing things up. MINIX does not spool output because floppy disk systems rarely have enough spare disk space for the spooling directory. Instead one normally would print a file f by saying lpr <f & to do the printing in the background. The lpr program insert carriage returns, expands tabs, and so on, so it should only be used for straight ASCII files. On hard disk systems, a spooler would not be difficult to write. 4. LAYER 3 - SERVERS Layer 3 contains two server processes: the memory manager and the file system. They are both structured in the same way as the device drivers, that is a main loop that accepts requests, performs them, and then replies. We will now look at each of these in turn. The memory manager's job is to handle those system calls that affect memory allocation, as well as a few oth- ers. These include FORK, EXEC, WAIT, KILL, and BRK. The memory model used by MINIX is exceptionally simple in order to accommodate computers without any memory management hardware. When the shell forks off a process, a copy of the shell is made in memory. When the child does an EXEC, the new core image is placed in memory. Thereafter it is never moved. MINIX does not swap or page. The amount of memory allocated to the process is deter- mined by a field in the header of the executable file. A - 5 - program, chmem, has been provided to manipulate this field. When a process is started, the text segment is set at the very bottom of the allocated memory area, followed by the data and bss. The stack starts at the top of the allocated memory and grows downward. The space between the bottom of the stack and the top of the data segment is available for both segments to grow into as needed. If the two segments meet, the process is killed. In the past, before paging was invented, all memory allocation schemes worked like this. In the future, when even small microcomputers will use 32-bit CPUs and 1M x 1 bit memory chips, the minimum feasible memory will be 4 megabytes and this allocation scheme will probably become popular again due to its inherent simplicity. Thus the MINIX scheme can be regarded as either hopelessly outdated or amazingly futuristic, as you prefer. The memory manager keeps track of memory using a list of holes. When new memory is needed, either for FORK or for EXEC, it searches the hole list and takes the first hole that is big enough (first fit). When a process terminates, if it is adjacent to a hole on either side, the process' memory and the hole are merged into a bigger hole. The file system is really a remote file server that happens to be running on the user's machine. However it is straightforward to convert it into a true network file server. All that needs to be done is change the message interface and provide some way of authenticating requests. (In MINIX, the source field in the incoming message is trustworthy because it is filled in by the kernel.) When running remote, the MINIX file server maintains state infor- mation, like RFS and unlike NFS. The MINIX file system is similar to that of V7 UNIX. The i-node is slightly different, containing only 9 disk addresses instead of 13, and only 1 time instead of 3. These changes reduce the i-node from 64 bytes to 32 bytes, to store more i-nodes per disk block and reduce the size of the in-core i-node table. Free disk blocks and free inodes are kept track of using bit maps rather than free lists. The bit maps for the root device and all mounted file systems are kept in memory. When a file grows, the system makes a definite effort to allocate the new block as close as possible to the old ones, to minimize arm motion. Disk storage is not necessarily allocated one block at a time. A minor device can be con- figured to allocate 2, 4 (or more) contiguous blocks when- ever a block is allocated. Although this wastes disk space, these multiblock zones improve disk performance by keeping file blocks close together. The standard parameters for MINIX as distributed are 1K blocks and 1K zones (i.e., just 1 block per zone). MINIX maintains a buffer cache of recently used blocks. A hashing algorithm is used to look up blocks in the cache. When an i-node block, directory block, or other critical block is modified, it is written back to disk immediately. - 6 - Data blocks are only written back at the next SYNC or when the buffer is needed for something else. The MINIX directory system and format is identical to that of V7 UNIX. File names are strings of up to 14 charac- ters, and directories can be arbitrarily long. 5. LAYER 4 - USER PROCESSES This layer contains init, the shell, the editor, the compiler, the utilities, and all the user processes. These processes may only send messages to the memory manager and the file system, and these servers only accept valid system call requests. Thus the user processes do not perceive MINIX to be a general-purpose message passing system. How- ever, removing the one line of code that checks if the mes- sage destination is valid would convert it into a much more general system (but less UNIX-like).